-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMD][Quantization] Add TritonScaledMMLinearKernel since int8 is broken for AMD #12282
[AMD][Quantization] Add TritonScaledMMLinearKernel since int8 is broken for AMD #12282
Conversation
Signed-off-by: Randall Smith <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the disruption and thanks for the fix. Could you add an int8 test to the AMD runner to make sure this doesn't regress in the future?
@mgoin Where is the AMD runner? |
I'm afk right now but you can look at the tests that have mirror hardware amd, each of these will run on the amd runner https://github.com/vllm-project/vllm/blob/main/.buildkite/test-pipeline.yaml#L93 |
Signed-off-by: Randall Smith <[email protected]>
Signed-off-by: Randall Smith <[email protected]>
Signed-off-by: Randall Smith <[email protected]>
@mgoin I put a test in test_triton_scaled_mm.py to just load a small model and test it. It only runs when current platform is ROCm. |
Thank you, looks good! |
We aren't able to run int8 models anymore, it's completely broken. This because there is no TritonScaledMMLinearKernel class.
I added in TritonScaledMMLinearKernel. Since there is no Triton kernel to handle asymmetric int8 quantization, the new TritonScaledMMLinearKernel checks for this case and returns a failure.
I tested and everything seems to work.